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with two different sets of item: parameter estimates ^ to study the effects on 
criterion-related validity of scoring methods arid/br item parameter estimates 
Criterion variables were high schboi and college grade-pcint averages (GPA); 
and scores on the Merican College Testing Program (ACT) achievement tests. 

_ Results indicated generally higher validities for the adaptive tests; 
at least one method of scoring the stradaptive teste resulted in higher 
correlations^than the conventional test with seven af the eight criterion 
:variabies (and equal correlations for the eighth)^ even though, the stradap- 
tive test administered over 25% fewer items, on the average, than did the 
conventional test. The stradaptive test obtained a significantly higher 
correlation with overall cbliege CPA (2^.27) than did the conventional test; 
when math GPA was partialled from overall GPA, the maximum correlation for 
the stradaptive test with an average length of 29.2 items was 2^.51, while 
the 4_0-i_tein conventional test correlated only .36. The data showed gener- 
ally higher criterion-related validities for the mean difficulty scores on 
the stradaptive test in comparison to the Bayesian and maxitnum likelihbod 
scbres; the different item parameter estimates had no effect bri validity ^ 
resulting in scores that correlated .98 with each bther. \ 
^ _ _ _ \ 

Although the mean length_bf the_ Bayesiah adaptive test was 48.7 it^s, 
the median number of items (35) was less than that of the 40-itein conven- 
tional test. Ability estimates from this adaptive test also correlated 
higher with seven of the eight criterion variables than did scores on the 
cbriventibnal tests, although none of the differences were statistically 
significant. 

^^^^ data indicate that adaptive tests can achieve criterion-related 

validities equal to, and in some cases significantly greater than^ those 
obtained by Cdnventidnal tests Xi^hile admin is terihg up tb 27% fewer items, 
on the average. The data alsb suggest that latent-trait-based scoring of 
stradaptive tests may nbt be bptimal with respect to criterioti-reiated 
validity. Limitations of the study are discussed and suggestions are made 
for additibnal research. 
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Criterion-Related Validity 
QF Adaptive Test ins Strategies 



Adaptive administratioti of ability and achievement tests promises considerable 
.iioproyeiaent in the laeasureiiieiit of individual differences. Sdine of these advantages 
were demonstrated in a series of theoretical studies by Lord (e.g.^ Lbrd^ 1969^ 
1971a, 197165 illustrating the potential of adaptive tests for oeasurement with 
nK>re equal precision throu^oat the range of measured ^ility than was possible 
with conventional tests .of cotnp^able lengthy Later simulation studies (e.g., Betz 
- & ^iss^ 1974^ 1975; _ifcBriae & Weiss^ 1976i .Vale S Weiss* 1975b) that further. var- 
i^ the characteristics of it^ pools used for adaptive tests and cbhventiohal cbm^ 
parisbn tests supported these theoretical resixlts^ draonstrating that in comparison 
to conventional tests^ adaptive tests can measure with greater precision for a 
fixed ntimber of items or with equal precision but using considerably fewer items. 
This finding has been observed in the iMasinrenieht of bpth_ability and achievement 
(e^g., Bejar S Weiss ^ 1978i Bejar, Weiss, S Gialluca, 1977; Brown S Weiss, 1977; 
Sialluca S Weiss, 1979). 

Early live-testing studies compariiig adaptive and conventional tests sought 
evidence for increased precision of measuremedt in hi^er_ levels of reliability. 
Because of prbblCTS in cc^putihg indices of internal cbhsistehcy for adaptive 
tests, these studies used testr-retes^ short time in ter^ 

vals to demonstrate- higher levels of precision for adaptive tests. Data supporting 
this hypothesis were obtained in a nipher of studies on the measurement of ability 
(Betr S W^isSi 1973^ 1975| Larfcin S Weiss * 1974^ 1975; Vale & Weiss^ 1975) and 
ac^ev^^nt (e.g., ICbch & Seckase, 1979). 

Although conside^^le resear^ has thtzs been concerned with investigating the 
increased precision of adaptive versus conventional tests, the validity of adaptive 
testing prbc^ures has also been of concern. The majority of validation evidence 
has de^nved from: computer simulatibh studies. In these studies, true ability (or 
a^ievCTent) level is teowa, md ^^acteris^ic ci^ e (tCC) model in coS-^ 

junction with a set. o£ 1C6 itest parameters, a testing strategy, and a scoring 
method is used to generate an estimated ability level. 13ie escimated ability level • 
cm then be ccrrelatai with the tiruei^br generated, ability_level to yield an indei 
of the^validity OTT fidelity ((?reen^ 1976) of meaaurCTent. This correlation indi- 
cates hw wli the t^^ Sili^ level c^ be recapt^ed by_tfae cbmbinatibn of^^ 
pool, testing^ strategy, scoring method. Data, front a number of such simulation 
studies^ (e.gv, Bet2 & Weiss, 1974^, 1975; Urry, 1970; Vale & WeissT, 1975) indicate 
hi^er levels of validity for adaptrive tests in comparisdn with conventional tests. 

The validity of adaptive tests has also been investigated in terms of cbrrelc 
tions of adaptive test scores with scores on conventional tests. Early studies of 
this type t^re real --data simulation studies in xihicb the administration of an adap-* 
tive test: wa^ simulated using a set of item responses obtained from the prior 
admini s tr at ion of a conventional test; items from the convehtiohai test were "re- 
admxnister^" to the s^e testee in m adaptive sequence^^a^ 

procednce was determined by correlation of the score on the adaptive test with the 
score oti^ the_parent cottven^^al test (e.g.> Cleary, Linn, & Rock, 1968a, 1968b; 
Krathirahl t. ffiiyser^ 1956)* This procedure is hot really a dembhstratibn of val^ - 
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idity, however^ since the _bbtaitied cbrreiatibn isi inerely a part-^dle cbrrelattdn 
that will reach a value of 1.0 vhen the adaptive test admitiistered includes all 
itexns in the conventional' test* 

In other validity studies (e.g., Bayroff S Seeley, 1967; Hansen, 1969) two in- 
dependent tests ineasuring the same ability— one adaptive and bne convent ibnal~vere 
admihistered to the same group of testees. The validity of Che adaptive test was 
then evaluated by the cbrreiafcibn of scores on the two tests i iithou^ tats ap- 
proach inrpiements currently accepted iefinitions of concurrent validity, it is in- 
sufficient evidence for the validity of the adaptive procedure. The prbblem with 
this method lies in evaluating the appropriate degree of cbrreiatibn to be expected 
between the two measurements (Weiss & Betz, 19735. A very hipi correlation Bet%^en 
the two test scores would indicate that the two tests were measuring equivaientiy; 
y^t a demonstration of equivalent measurement is nbt a demonstration of the im- 
provement bf adaptive testing over cbxtventibnal testing. If the cbrreiatibn be- 
tween scores on the two tests is hot very high^ however, the qxiestibh of which pro- 
cedure is measuring better can be raised. %us, this approach to studying validity 
results in an unresolyabie dilemma. 

As a partial resblut ion ^ the relative construct validity bf adaptive versus 
cohvehtiohal testing strategies has been studied ^Bejar & Weiss, 1978). Although 
this approach is useful, it requires the precise specification of a nemo logical net 
for its implementation and may not always result in clearly interpretable results 
because b£ the measurement properties bf other variables in that net. 

For practical applications of adaptive testings criter^tbn-reiated validity ev-^: 
idence will be most: appropriate. Sowever, the literature to date includes very few 
criterion-related validity studies. Aiigoff and Huddleston (1958), using real-data 
simulations^ were the first to _ study the .criterion-related validity bf ah_ adaptive 
test. ThSy exandhed the cbrrelatichs with grade^THDiht averag;es of several two- 
stage tests in cbmparisbn tb several cbnventibnal tests using it^s administered tb 
about 6,000 students from the College Entrance Examination Board's Scholastic Apti- 
tude Test. Their results indicated that the narrow--r^ge (peaked) second-stage 
tests b£ their simulated two-stage tests had sli^tly hi^er _validiti-es than did 
the wide'-rahge (rectangular) cbnvehtidhal tests cons true ted' from the same item 
pbbli 

Linn, Bdcfcy and Cleary (1969) also studied the criterion-related validity of 
adaptive and cbhvehtibnal. tests. Their study, used scbres on the College Board 
Achievement: Testa in American History and English Cbmpdsitibh^ with' the verbal- 
/mathematics tests of the Preliminary Schciastir Aptitude Test as external crite- 
ria* the verbal portion of the School and College Aptitude tests and the Sequen- 
tial tests of Educational Progress ^re admnis tared to 4>,885 testees and then, 
using real -data simtilatiptl techniqxies^ were rescbred for apprbximately two-thirds 
of the group fbr whb^a criteribh ihfbrmatibh was available, using five different 
adaptive testing prbcedt^es^ The conventional cbmparisbn te^t was created Srom the 
same i9b-item pool. 

Linn et al. (1969). fpuhd that the adaptive tests had higher cbrTeiatibns with 
the criterion tests than did the cdhvehtibnal tests shortened tb the length of the 
adaptive tests; '&ls study had the li^tatibn bf using a si^lated adaptive test- 
ing administration mode rather than live adaptive administration, ^is makes it 
difficult to generalize tiie results to testees actually taking adaptive tests where 
interaction effects may exist bet^en testee response^ item selection^ and item 
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order, sisb^ this stxidy vas cbnfdutided by item overlap between the conventional 
and adaptive tests. 

Waters (1974, 1976), in his adaptive test validation study, als6_cbrTelated 
scores on adaptive and conventional tests with another test, which served as an ex- 
ternal criterion^ His criterion was the Florida 1 2th Grade Verbal Test scores - 
Waters divided his testee pbpulatiba into six groups: One group of 55 testers was 
administered a stridaptive test (Weiss, 1973), and five smaller groups (N * 8^7^ 
9 J 12j and 10) were each given a different conventional test. One-fifth of the 
itertis on the str^aptive tes*: were the same as those on the conventional tes^s. 
Althbu^ the scores for the five 2onventibnal subtests were different^ •::hey were 
normalized ^d pooled for comparison with stradaptive results. . 



Waters fbund rcstrictibn In the range of ability level for his sample: Jfost 
testees tended to be at che high end of the cbntinuixm* His results indicated Chat 
none of the stradaptive validity cbefficients were significantly different frbni the 
c<5n:ventional test validities; the restilts did show, however, that the shbrter 
stradaptive test proved more reliable than the longer conventional test. Thus, 
with fewer items administered, the stradaptive, test produced validity coefficients 
comparable"' to cbiiventibnal test validity cbefficients. 

The iugoff ^d Euddlestbn (1958), tim et al . (1969) , and Waters (1974, 1976) 
studies x^re_ all criterion-related validity sttidies. The Angoff and Htiddlestoa 
(1958) and the Linn et al^ (1969) stxadies were limited by_ the tests being scored as 
if they were admnistered adaptively, intrbducii^ limitations created by the simu- 
lation apxnroach, and by soma of same items being tised in both tests. Waters' 
(1974, 1976) study eliminated one- of these problems: He used live adaptive testing 
and did 'nbt give tile same subjects both the adaptive test and the conventional ^ 
test, even thbtx^ bhe-fifth of the items were cbtmnbn betroen the two tests« How-* 
ever, since his study was an independent groups design in which the adaptive and 
conventional tests were administered to different groups b£ testees, h^ 
introduced sample-fpecific einror into his researdi design, particularly because of 
the relatively small sample sizes used. jSn additional problem in Waters' study re- 
sults' frbi the pbblinj of data front the five cbtrventional sxd) tests given to five 
different ^oups of testees and the: comp^isbn of the pbbled score distributions 
with the adaptdrve test score distribution^^ 

, A prbblesr charaiteristic b£ both the Litin et aL_. (1969) and the Waters (1974, 
1976) studies was Se use b£ scores on a. conventional test as an external criter- 
ion. Since one of the predictors ws also a cbmrentibhal test, this cbuld have in- 
troduced method variance in the correia^on of the conventional ?^®4^^tbr^test 
scores with^the conventionaL criterion" test scores^ thus conceivably inflating 
thMe validitjr cbjefficientsr. It such metilbd variance was present, it would not 
have similarly inflated the validity coefficients for the adaptive tests,- possibly 
masking gains in relative -validity du^ to adaptive testing. The Angbff and 
Huddleston (1958) study, however, used grade-point average as the criterion but did 
not use actual adaptive test administratibn. 



The present study was designed to investigate the relative validity bf aSap— 

tive and conventional testing strategies using non-test variables as one ?^t bE ex- 
ternal criteria. The study was similar tb Pfeters' (1974; 1976) study in that the 
.^adap^ire tests were computer-administered; it was similar tb the Linn et al. (1969) 
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stiidy in that each group of testees took both an adaptive and a conventional test^ 
.Br.t there was no overlap in the item pools used for the two testing strategies; . 

HETHOD 

■** ■ 

Two adaptive testing strategies were compared to 5 conventional ability test 
^ in terms of criterion-related validity for t<?o separate groups of stxidetlts. In one 
;' group students completed both a variable length stradaptive test and a peaked cbn-^ 
ventiphal test; in the second group students cbmpletea a variable length Bayesian 
adaptive test (Owen, 19753 ^d the s^e peaked conventional test. Mi tests were 
computer-adm^istered and consisted of five-alternative multiple-choice vocabulary 
items. Test scores from each of the tests were correla^ted with high school grade- 
point -ayefage^ University of Minnesota grade-Tibiht average, , and scores on- the Amer- 
ican College Testing Prbgr^ siibtests.' 

Siri^j^cts^^aad Data 4^11ecti6ti 

Group 1 testees were administered the stradaptive test and tfie conventional 

test. Volunteer testees were college students attending' classes at the University 

of iSnnesota^ Most were juniors, seniors, or graduate ^tXKients enrolled in psy- 
chology courses at the time of testing. A total of lOl itudints had usable daca_ 
for this sttidy^_ Data. were collected -during the winter (51. 5Z) and spring (48. 5Z) 
qtiarters of 1973. All students were given the conventional test followed by the 
stradaptive test or vice versa, ffie order- in ^idt the tests were given was alter-* 
nated to control for* sequence effect 3«. Both tests were given in a single adminis- 
tration. 

Students in Group 2 were adm^stered the Bayesian adaptive test and the con- 
ventional test. Forty-three percent of the students in this group were given the 
tests dtaring spring quarter of 1973; the other 57% ^re administered the test 
during winter qu^ter of 1974. As in Group l_all testaes were college stxident vol-- 
unteers attending classes at the University of Minnesota; most were juniors, se- 
niors, or- graduate stxid^ts earolled^ psychology courses at the .ttro of testing. 
•A totair of 131 sxSjects had us^te data. Testees were alternately given the con- 
ventional test followed by the Bayesian adaptive test 02^ vice versa. 

AIL items given were multiple-^hoice vocabulary it^s selected from, the s^me • 
it^ pool (HcBride & Weiss, 1974K ItCT pools for the stradaptive ^d Bayesi^ 
tests utilized a st^pobL that excluded the 40 items in Che conventional test.^ All 
tests were presented using^ cathode-ray-terminals (CRTs) acouscically coupled to a 
tiae-shared contputer. Items wire presented with a niinibar. representing, the cbr^ 
alternative; testees answered by typing the ntimber of their choice. If testees did 
not Imow the answer md did not wish to guess, they were instructed to respond witK 
a question mark* itesas answered with a question mark were, scored as incorrect. 
Tests were preceded by instructions on how to usje the CRT; basic biographical data 
were also collected on the CRT prior to test administration (see DeWitt & Weiss ^ 
1974). 



Stradaptive Test 

Item branching* The stradaptive test itaa pool cbtisisted of 141 itOTs strati— 
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fied into 9 strata, or peaked iteai pools, each varying in level of difficulty. 
Stratxjm 9 contained items of the highest difficulty level, and Stratim i included 
items of the lowest difficulty level. Entry points for selection of the first item 
to be administered to a tes tee were based- on the student's reported grade-point av- 
erage (GPA), as shown in Figure i. Fbiiowing entry into the s tr adaptive structure, 
an up-one^ down-oue branching tixle was used. That is, a testee was administered 
the next unadministered item from the next lower stratum, or difficulty level, fol- 
lowing an incorrect answer or the next uhadministered item from the next higher . 
stratOT, or difficulty level, following a correct answer. Question mark responses ^ 
which were treated as incorrect responses, caused the testee to be branched to the 
next easier stratum. 

Figure i 

Stradaptive Test Entry Point Question 

Entry 
Stratum 
(Not Seen 

IN WHICH CATEGORY IS YOUR CUMDLATIVE GTk TO DATEt 



1. 


3.76 


to 


4.00 




2. 


i.51 


to 


3.75 


..... .8 


3. 


3.26 


to 


3.50 


7 


4. 


3.3i 


to 


3.25 


6 


5. 


2.76 


to 


3.00 


. .....5 


6. 


2.51 


to 


2.75 


......4 


7, 


2.26 


to 


2.50 


3 


8. 


2.61 


to 


2.25 


2 


9. 


2.00 


or 


less 


......I 



ENTER THE CATEGORY (1 THROUGH 9) AND PRESS THE ''RETURN" KEY, 

The stradaptive test was variable length; Testing was terminated when a ceil- 
ing stratum was identified for a testee (Weiss, 1973). The ceiling stratxjm was 
identified as the- stratum in which the prppprtibn of correct responses made by the 
teste * was ,.26 or- less, fbiibwing the administratibh of five items in that stratum. 
This is the proportion of correct answers ejected as a reiult ^f random guessing 
da five"-^iteriiative mtiltiple--chcice items. If a ceiling stratxmi was not identified 
after^ 75 items had beett adfflinistered^ testing ^?as terminated. 

^ p^^i:^ Appendix T^ie A. sho^ the itOT pool used for the stradaptive 
test. Strata, included from a m ^n -i tw t mr of 16 items in Stratum 9, the most difficTilt 
stratum^ to a T»fl-«-Ttmfflr of 36 items in Stratum 1. The item pool was structured and 
itCT selection, implemeiited usiiig a set of item characteristic curve (ICC) item 
par^eters available at the time that tests \^re_ administered 7 these are referred 
to in T^le A as original parmeters. As described by Prestwdod and_Weis3 (1977)* 
these pa'^asiieters were later recalculated for scoring purposes. All ICC itCT param- 
eter estimates were based on conv<;*r3ions of the classical difficulty and discrimin- 
ation parameters to the_ICC metric* as described by McBride and Weiss (1974) and 
Pres^bod and Weiss (1977). ICC lower asymptote (jc* or guessing) parameters were 
set at .20 for ail items. 

Scbri3ig> Tlie stradaptive test was scored by a miSer bf differrat sco^ 

methods; in order to ciOTpare the relative validity of diffarent ways of scoring the 
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same pattern of item responses. Scoring methods that used the KG item parameters 
were a:pplied using both ^he original and revised item parameters to determine the 
effects of the item parameter revision on score validity. 

Str adaptive test responses were scored for ability level with .two scoring 

methods that used _only sbi^ of the information in the iCC item parameters. The 
mean difficulty of all items administered (Mean Difficulty Administered)' score was 
expected to provide more stable ability est Licates because it used difficulty infbr- ' 
matxon from ail the items adtainistered to a t^jjtee- A potential deficiency of this 
score is that it is affected b^^inapprbpriate entry points. For example, if a 
testee begins the_ test with items_Som a stratum of much higher difficulty level 
tli^an his/her ability, he or she will have taken more Unnecessarily difficult items 

had been begun- w^^^ Thus, the 

Mean Sifficuity Administered, score would be higher than w^auted for the testee. 
To eliminate this problem^ the mean difficulty of items answered correctly (Jfean 
Diffxculty Correct) score ^ also computed. This score does not take into account ^ 
spuriously admnistered items of high difficulty unless they are answered correct-- 
ly- One potential disadvantage^ however, is that it ignores information fcom items 
not answered correctly. 

icC^Med scoring Methods (Bejar & Weiss, 1979), which utilize not only the 
testee^s entire response pattern but also the difficulties, discriminations, and 
guessing, parameters of all the items administered to a testee, should provide opti- 
mal scoring of any response pattern. To comp^e the relative validity of these 
scorijxg- methods^ both Ha^mum Li^ Owen^s (1975) Bayesian scoring methods ■ 

were- u^ed^to: score_the stradaptxve test item: responses. Bejar and W^iss (1979) 
have provided descriptions and computer programa for these scoring TOthods. 

A problem^cfaaracteristic of Maximum^dfcelihbo^ sconng is that a score cannot 
be determined for testees t*o answer every item correctly, who answer every item 
incorrectly, or_who have very unusual response patterns (e.g. ^ answering many dif- 
ficult items correctly and tnany easy items, incorrectly) . In these cases the esti- 
mation procedure fails_to converge^ i.e. , it converges^ on plus or minus infinity 
(Kingsbury & Weifs^ 1979)^ a. the stradaptive data, two pestees had item response 
g_atterns that failed to converge using the Maximtm Likelihood scbring procedure. 
Their test scores', derived front this procedure, were deleted from the data analy- 
ses. ^ " 

' - '_ ' 

The_preceding- four scores are all "point estimates" of ability level (Weiss ^ 

1973). ^Howeverj as^ Trabin and Wfeiss J 1979) have showtt, there is additional infor^ * 
matidn in test ite» response patterns beyond these point estimates.. An individual 
whose response pattenr fluctuates between several strata is a more inconsistent re- 
sponder than one wtxo is administered itans from only a few strata adjacent to one 
another. (SDiasistency amohg.scbres indicates either the stability of a testeels [ 
ability estiaxate (Weiss, 1973^ p. 26) or the testee 's fit to the ICC model. In * 
this st^y the standard deviation of item dif ficulties_pf all items administered 
(S3 Admnistered) was used as one consistency score. This score was chosen feom 
aisDng the available types of consistency scores to reflect the dispersion of the 
difficulties of all ;i tans administered, not just those itCTS that were answered 
correctly^ ixx/ord^r to make_more complete use of the item respdnse_patterns avail- 
able.^ In addition, the standard error of Owen's Bayesian score (SE Owen's 
Bayesian) was used as a second consistency score. 
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Bayesian Adaptive Test . ' 

^ A vari^le- length adaptive test based on Owen's (1975, McBride S Weiss, i97f 
Bayesian_ adaptive testing strategy was administered to ail testees in Group 2. 1 
item pool for this test consisted of 200 it^s selected from a larger pool (McBri 
& Weiss, 19745 after the conventional test it^s were excluded, items in the poc 
ranged in difficulty from t«-3. 19 to S»2.95;, all items had ^values of •40 or 
greater (see Appendix Table B), Items were selected and scored using only t^ie 
original item parameters. 

The Bayesian adaptive test was begun with differential prior ability estimat 
(9), as shown in Figiire 2. the prior Is shown in Figure 2 for each of the levels 
, . of student-xepprted grade-point ^ ^ (GPA) were chosen to reflect a positive 

level of correlation between GPA and vocabulary ability as measured by the adapti 
test; the relatively lower 9 values for hipxer GPAs were designed to t^e into 
account chance successes resulting from guessing, the relatively large variances 
of the prior S values were chosen to reflect a high degree of uncertainty about t 
prior ability estimates > so as hot to assume a very high. positive correlation be^ 
tween GPA and vocabttla^ ability. Testing was terminated either ^en the varianc 
of the posterior ability estimate was .09 or less, reflecting a standard error of 
of .03 or less, or when' a tnaximum of 135 items had been administered. ^ 

Figure 2 

Bayesi^ Test Entry Point Question 

initial Values_Set for 

_ _ _ Bayesian Ability Estimate (§) 

IN WHICH CAIEGOBY IS YOUR CDHULSTIVE and Variance of § 



6EA. f 0 MfE? 














Variance 






§ 


of § 


1. 3.76 to 4.0Q 




1.23 


3.5 


2. 3.51 to 3.75 


> 


.77 


3.0 


3. 3.26 to 3:50 




'.50 


2.5 


4. 3.01 to 3.25 




.18 


2.0 


5. -2.76 to 3.00 ' 




.09, 


2.0 


6. 2.51 to -2.75 




-.31 • 


2.5 


7. 2.26^to 2.50 




-.56 


3.0 


8. 2.31. to 2.25 




-.85 ■ 


3.5 


9. 2.00 or lea*' 




-1.41 - 


4.0 



ENTER THE CATEGOET (I THBDUiS 9) AND PRESS THE "SETTON** KEY. 

Coii'7^xii:ioaal -Test . 

The same 40-rLtem peaked ^convehtibhal test was adsiinistered to the grdups of 
sttzdents who took the stradaptive and Bayesi^ tests. It^s were selected based c 
a proporrion correct ^out .60^ in order to adjust the av'erage difficulty of tl 
items for guessing ^nd high biserial correlations with total score. 

Appendix Table C shows the ICC itair discrimination and difficulty parameter 
estfmates for iti^g m the convmtional test. The st^dard* deviation of the- it ^< 
difficulties for this test: was .ll^ which was considerably lower ^han those of 
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either the stradaptiye or 'Bayesian test item pools* The average. item discriniina- 
tion of the stradaptiye pool i^.7A5 for the original parameters! was slightly 
higher _than that of the conventional test Ca«.543), as was the average dis.crimiGa- 
tibn of the Bayesian pool (j*.7965. The cotiventional test was scored by counting 
the nuntber' of correct answers (Ntiiisber Correct score); dmitted answers were scored 
as incorrect. 



Because the tests being investigated, were verbal ability tests, the criterion 
variables were chosen to reflect this ability. Four different variables were ob- 
tained from student Records, but not ail variables could be obtained for every 
student in the two groups: 

1. High school GPA (HS-GPA); 

2. University of Mnnesota overall GPA ^UMHDGPA); 

3. University of ffime 3 Ota math GPA (SS-iffiPA), ^ich was used to jjartial out 
the effects of mnnerical ability resulting in a partial GPA (UM-PGPA); and 

4. American iSollege Testing Program (ACT) test scores. 

All CPAs were calculated by assigning the following nimericai values to letter 
'grades: A^, B»3, C=2, D»1^ HS^PA was calculated as the overall GPA of the stu- 
dents when they were sophomores through seniors in hi^h school; UM-OGPA was compu- 
t€5d as the overall college GPA of the students, thrbu^ the spring of 1976; and 
UM-M6PA was derived from the GPA of all math- classes taken by the students at the 
University" of Minhesbta. 



The ACT batte.'y was administered to the students in either their Junior or 

senior years of high school. The test is designed to measure a student's ability 
to perform "typical^ intellectual tasks asked of college students." The ACT resulted 
in five scores: English, mathematics, social science, -natural science, and a compo- 
site scqre. 

Data for two of the criterion variables were availabl^'^pr to test adminis- 
tration (HS-GPA. and ACT scores ) . Data for the other two criteria were gathered 
after the students had taken the conventional and adaptive tests. 

Data^ Analysis 

Comp a rison of th e^&daptive and Cottventibnal Tests 

The adaptive and convent ibpaL tests were designed to compare the respective 
criteribn--relate^validities of the testing strategies against the four external 
criterxa* Comparative validity assessments were of specific interest. Predictbr 
variables used were the ability estimates from bbth adaptive tests and the cpnveh- 
tional test. Consequently^ Pearson prbefaict^mbo i eht correlations were calculated be- 
tween ability estimates derived from the:^a^*aptive tests ^d the foOT external cri- 
teria and between the conventional test and these fotxr measures. 



In addrtion, the mean, tnedianj standard, deviatibn^ skewness^ and kurtbsis were 
calculated fbr all predictbr variables and the criteribn variables. Although abil- 
ity. Mtimates derived front the different test admnistratibn strategies and scoring 
methods could "not be evaiuated^-on how closely they reflected the true underlying 
ability distribution ^because tfiis dis'tribution was not known for the testees, these 

ERIC 13 



data provided a relative comparison of how the different testing strategies and 
scoring methods described the individual differences among the students tested. 

Cbrrelatibhs between Stradaptive and Cohvehtibhal Test Scores 

To determine whether the adaptive and conventional tests were measurit^ the 
same ability, ability estimates from the adaptive tests were correlated with scores 
from the 40'-*i tern cbtxvehtiohal test for all examinees whb ccnnpleted bbth tests. 
Cbrrelatibhs were calculated using both original and revised rt^ p^^eters for 
all stradaptive scoring methods. 

These data also 'provided intercbrrelatibns among scbres bh the stradaptive 
test fbr bbth the brigihal and revised it^ par^oeters. This comparison provided 
infbmtatibn bn^ the effects of using the original item parameters. Correlations of 
these scores with the criterion variables also permitted evaluation of the effect 
of the different item parameteir estimates on criterion-related validity. 

'<^' . ■. " - - - ' " 

Tes^tL Length ^^ergnfl ASility^^ 

Ability estimates firom both the Bayesian and stradaptive tests were correlated 
with test length. For the stradaptive test this analysis was performed tb deter- 
mine if the scbrii^ method interacted wi th~it:^~pbol char^a cteris tics , . re sulting in - 
different correlations for the various scores and test lengths, ^ese correlations 
were also computed for scores derived from the two different sets of item para- 
meters. 



BESUKS 

Characteristics of Score Distributions 

Table 1 shbws descriptive statistics fbr scbres rbr all tests administered in 
both groups;: 

Convent ibtial test . The 40-itan- cbxrvehtibnal test performed almost identically 
in both grbups; there was hb significant difference in the mean test scores for the 
two grbups>^ average mTrnher^brrect scores (Nij^er Cbrrect) were 22^60 and 

17..Siy with standard deviations of 8.33 and 9 ..01 in Group 1 and Group 2, respec^ 
tively. These mean scores vere very close to the- predicted means for the group bti 
t^idi ' the_ test was cpnstinicted* _ Neitheir score distribution tos -significantly 
skewed, althbu^' bdth distnributibhs were significantly platykurtic, indicating a • 
flatness in^th^ scores in comp ari son tb a normal distribution.. 

Stradaptive test, ^e stradaptive test ai^itrifri q^<ttT»arf ati CTo-rag^ nf 
itexos^ with a median bf _2i.__The distribution bf hxsnber of items admihisteted (NuoH' 
ber Administered) was significantly positively skewed and leptbkxirtic, indicating a 
distribution that was more peaked than a hbrmal distributibh, ^with a few very Ibhg^ 
test lengths. The distrihution of nuSjer-correct. scores (Number Correct) for the , 
stradaptive test was skewed similarly to that of Number Administered but with a 
mean of 14.90 and a median of 11.20. Both the. means and medians indicate, that i bti 
the average^ the stradaptive- test fohctibhed almost bptixnally, administeritig tb the 
average students items that were ^sx^red cbrrec:tly abbut 5Q% bf the t^Sae. The av-^ 
erage Number Adrnnrstered in the s^adaptive test was 25Z lower^ than the 40-?item 
length of the conventional ^est. ' * 
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Table i 

Descriptive Statistics for Scores Som eonw^entionai, 
Stradaptive^ md Bayesian Adaptive Tests 



Test and Score 


N 


Mean 


Median 


SD 


Skew . 


Kurtosis 


Conventional Test 














Ntsnber Correct 














Group 1 


100 


22.60 


21.50 


8.33 


.13 


-1.08* 


Sroup Z 


131 


22.82 


22.60 


. 9.01 


.04 


-1.09* 


Stradaptive Test (Group i) 














Nusiber Administered. 


101 


&.29 


21.00 


24.03 


2.50** 


7.08** 


Nunber Correct - 


101 


14.90 


11.20 


12.04 


2.31** 


6.58** 


Original tteoL Parameters 














Mean Difficulty Administered 


101 


.26 


.17 


1.00 


;15 


-.71 ■ 


Mstan Difficulty Correct 


101 


-.10 


-. 18 


1.04 


.28 


^.62 " 


Owen's Bayesian 


ibi 


-.18 


-.30 


.94 


.31 


.17 


Maximum' Idkelifaood 


100 


-.05 


-.30 


1.14 


.81** 


.78 


SD Administered 


101 


.73 


.72 


1.19 


.68** 


1.31 


SE Owen's Bayesian 


101 




.39 


.15 


1.28** 


2.52** 


Revised I tea Parameters 














Mean Difficulty Afaninxstered 


101 


.68 


.57 


1.10 


.16 


-.75 


Mem Dif ficuity Correct 
Owen's Ba^sim 


101 


.26 


.17 


1.12 


.31 


-.58 


101 


.23 


.12 


1.08 


.41* 


.05 


MfliclTiiiTm. Likelihood 


99 


.30 


.20 


i.il 


.49* 


.08 


SD Administered. 


Ibi 


.84 


.80 


.23 


.47* 


.28 


SE. Owen's B^esian 


101 


.32 


.29 


.21 


4.47** 


23.86** 


Bayesim Adaptive Test (Group 2) 














Number Administered 


131 


48.75 


35.00 


29.71 


.90** 


-.04 


Nun&er- Correct 


131 


25.56 


16.42 


19.36 


1.83** 


4.03** 


Bayesian Ability Est^ate 


131 


.36 


.06 


1.17 


.341 


-.62 


' Vari^ce of ^ility Estimate 


131 


.08 


.08 


.02 


6.78** 


48.04** 



Statistically differexiC firont zero at p <.05* 
**Sl:atistically different firbS zero at p-<i0i. 

^^featt ability scores usictgr th& original itenr parameters were similar foir Heai 
Diffxctiity Correct: (-.10), tfaxf ititint Likelihood (-.055, and Owen's Bayesim (-.18) 
spring: «thoda; as expected^ the average Hean Di ffi cnlty A^^stered scores were ^ ^ 
4ifferetit: front the other scores, due to some inappropriately high entry point esti— 
nmtis. Pwen-s Bayesim sco^ res^ied in the lowest ti«att^ility^ estimate (-^.18); 
media^ ability- estimates for Owen's Bayesian aixd Vkximm Likelihood, scores were 
identical (-.30). All ability estiinate <H.stributions were positively skejred, al- 
though 'only the Ifeximunt Likelilibbd score was significantly skewed. ISe distrxbu— 
tiotts of the tro latent^^ait-^ased^ scores were leptbkrntic, whereas the mean dif— " 
ficylty^ scbrp were pl^^taxr^icT h^ bf these kurtosis values were sig- 

hificantly different front a normal distribution. In contrast, tb Number Correct 
f r the conventional test, three of the four stradaptive ^ility scbres using the 
o^iginat item: parameters better apprbadmated a normal distribution.. Both the SD 
Administered and SE. Owen's Bayesian scbres resulted in positively ske^d and peaked 
distributions • 

Hsing:^e revised ite^^ the feur stradaptive_^ility scbres showed 

nearly eqtxat standard deviations and fRSsitive skew. Owen's Bayesian score and the 




Maximum tikelaJiqgd score had significant positive skew (£ <.05). The meaxi. diffi- 
culty scores were platyktxrtiCi but not significantly so, whereas the Bayesian and 
Ife^imum Likelihood estimates did hot deviate from normal kurtbsis. Ail medians of; 
t^iibility estimates wre smaller corresponding means. Again, the Mean 

Difficulty A d min istered score had a^ higher mean (and median) than did the other 
three^ility scores. The SD Adtnihistered score and the SE Owen's Bayesian" score 
had similar distributions with the revised parameters as they did with the original 
item parameters. Both means and medims of all scores computed using the revised 
item^par^eters were consistently hi^er than they were using the original item 
parameters. \. ; 

^ Bayesian adaptive test . Heah test length for the Bayesian adaptive test was 

^^.75 items, an increase of 8.75 itCTs (22%) over the length of the 4b-item conven- 
tional test, ^e median test length for this test, however, was_35 itenSi a 12.5Z 
reduction from the conventional test length. Thus^ some of the Bayesian adaptive 
tests were quite long, restilting in a positively skewed' distribution of NxSber Ad- 
iidnistered (50 students answered more tii^ 50 itras^ md 19 students answered more 
than 80 it^s). ffiese 16^ test lengths were probably due to the large prior, vari- 
ances used in selecting the first item for the Bayesian test in cbhjuhction with 
the small posterior variance used to terminate, the test,. Both the mean and median - 
of the Number Correct in the Bayesian test (25.56 and 16.42, respectively) show 
that, the Bayesian test operated properly in administering items at a difficulty 
level so that, about 50% of ^e items, administered were answered correctly. 

The BayesiaxL ability estimates were distributed normally with slight, but non- 
significant iplatykurtosis^ Tb^ varimce of the. ability estimates had a very 
peaked distribution:, with a si^ific^t positive skew. 

Crit e rion V a riable Dist ributions 

Table 2 presen^ desc statistics for the criterion variables for both 

groups. n^ans rot^-6EA in both groups were higher than means of either 

^^''^^^_9F_^^''^^^^^^^? both within grdtips axidT between groups. 

The distributions- of BSHSSk and DM-OGPA had significant negative skew in Group I, 
but skew was not significant in. Group 1. None of the GPA distributions differed 
significantly from nor mali ty ih^ terms of kurtosis in either group, altfaou^ there 
was a slight t(mdettcy- tow^i pla^^tttosis^ The standard deviations for all CPAs 
were, very similar.. 

^CT meit scores ranged fr cat 22.00 to 26.61 and were essentially equivalent for 
the tw. grpups^^ Stan<&rd deviations varied 5ot 3^52 to 6.47 and were also compar- 
-able fer tte two groups. All ACT scores wre negatively skewed, with several sig— 
•^Sifican^y so i &ere was a general tendency for ACT scores to be leptbkurtically 
<^3pibutejf, although mo^t did hot differ significantly from normal in terms of 
IcUrCosis. None of the differences in mean scores between the tw groups on a^ of 
the criterion variables were statistically significmt <.053. 

Test Score Correiatibns- 

, Stradap tive and "cbnyen^ tests. Product-mdmeht ihtercorxelatibhs ; among 

the four*' itradaptxve 'ability estimates and the corresponding consistency scores are 
sho^ia^izL Table 3. Ster correlations ^e shown between scores derived from the 
original item: parameters a^ tte revised itm parameters of the stradaptive test, 
and ^tit Nonber Correct oa the conventional test^ Also included are the students* 
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Table 2 ^ 

Descriptive Statistics for eriterion Vari^les 



Srotip and Griterion N Mean . Median SD Skew Kurtbsis 



Group _ i 



HS-GPA 


56 


3, 


,12 


3 


.15 


.68 


-.72* 


-.21 


BM-MCPA" 


77 


2. 


,81 


3 


.00 


.83 


-.41 


-.62 


OM-OGPA 


101 


2. 


80 


2, 


.90 


.73 


-.76** 


.37 


ACT Score 







- 




- 








Exxglish 


55 


22. 


00 


21. 


.95 


3.52 


-.33 


.26 


0 Mat&CTatics- ' 


55 


25. 


98 


27. 


.25 


6.47 


-.91** 


.35 


Social Science 


55 


24. 


93 


25. 


.42 


4.50 


-.82** 


1.43* 


^tural Science. 


55 


25. 


42 


25, 


.57 


5.76 


-.51 


-.83 


Composite 


'55 


24. 


76 


25, 


,00 


4.39 


-.46 


-.50 


Group 2 


















aS--GEA 


7i 


3. 


17 


3. 


14 


.55 


-.49 


.01 


- UM-MGPA 


104 


2. 


71 


2. 


67 


.76 


-.08 


-.39 


DMH3GPA ' 


i31 


2. 


81 


2. 


83 


.60 


-.22 


-.47 


ACT Score 


















Exxglish 


72 


22; 


03 


22. 


30 


4.23 


-.21 


1.76 


^^thematics 


71 


26. 


lb 


26 i 


89 


5.41 


-.78 


• .47 


Social Science 


71 


24. 


79 


26. 


00 


5.04 


-1.11** 


.72 


Natural Science 


71 


26. 


61 


27. 


91 


5.00 


-1.55** 


3.18** 


Composite 


71 


24. 


99 


25. 


44- 


3.93 


-.77 


.25 



: j^ta^stic^ly different from zero at p <,05. : 
**Stati3ticall7 different from zero at p <.bl. 

reported CPAs used as an entry^ point to the stradaptive test, and NxSber Adminis- 
tered and Nisnber Correct in the stradaptive test* 

Aitfaou^ there were nonsignificant correlations between the entry, point and 

Nussber^ Mministered and Number Correct, the latter tvp variables correlated .97. 
This high correiatipn resulted iErbot the lack of very difficult itms iS the strad- 
aptive test (e.g, ^ Stratim 9, the most difficult str^ 16 itCTs), jAxch 
restilted in the inability of^e test^tb locate a ceiling stratum for students with 
very hi^^iiity. ^us, for these students, the test would continue adisinistering 
items that were answered correctly. 



TJsingr bpth the original ani revised it^ parameters, the entry point variSle 
(reported GPA) had moderate and signLficmt correlations with ail ability scores; 
the lowest were ^^.31 and .26 with Owen's Bayesianr score fcr the original' and re- 
vised: parameters, respectively. Entry point data correlated hi^est (rf,45 and 
.46) with the Ifean Difficulty AdTn-ftlistered score. Althou^ the entry point data 
correlated npnsighificantly with the SD Administered consistmcy score, the SE 
^^n'^s Bayesian consistency score correlated signific^tiy and .44) with en- 

try point datai &i3 latter result, however, is likely a result of the sane fac-- 
tors that resulted in the correlation of .97 between Number Cptrrect ^d Number Ad-^ 
ministered. Stradaptive entry point data also correlated r^.34 with Number Correct 
bn the conventional test^ whereas neither Ni^feT Administered nor Nt^er Correct in 
the^stradaptive test correlated si^ificantly ^th Nu^er Correct on the conven- 
tional: test« 



table 3 , 

Intercorrelatibas of Scores fron Stradaptive and Conveacional teste (N«101} 



Teat and Score 



Score 



i 2 .3 4 5 6 7 8 9 10 11 12 13 H 15 



Stradaptiv3 feat 

1. EtiCry Foiiit 

(Reported 6FA} 

2. Nu^er 

' idainiatered 
3; iiiii^er 
Correct 



^.18 .97 



Djriginal Itei Paroetera 
4; Mean ttfficut|:y 

AdoioUtered 
5« Hean Difficulty 

Correct 

6. O^ea^a : 
, Bayeaiaa 

7, Haxinua. . 

livelihood 



I 

I 

.46 -.07 .04 I 

I 

.43 -.06 ,06 I 1:00 
I 

.31 -.09 .04 I .96 .97 



.34 -.05 .04 1 .96 .96 1.00 





I. SD idiin- 

Utered 
9. SG Owen'a 
BsyesiBB 



I 



:.0l .4? .50 I .11 .09 .06 .01 
.33 -;34 -.30| ;75 ;75 ;73 J8 









f 












1 












ieniti Icea Paraneteri 






1 












1 












10. Mean Difficulty 






i . 












1. 












i^unistered 


.45 


-M 


.04 1 i.fio 


.99 


.95 


.95 i 


.ii 


:?5 


1 
1 












11. Mean Difficulty 






__ 1 _ . 

.05 1 \M 






















, Correct 


,.42 




1.00 


.96 


.96 ! 


.09 


.76 


1 


.99 










12. OHen'a . 






1 












1 












Bayeaian 


.26 




.04 1 .96 


.97 


-.98 


.97 1 


.67 


.74 


1 


.96 


.97 








|3. Haxiwim . 
















1 












^kelibood 


.33 


-.06 


MJ .96 

— Jii 


.96 


.98 


.98 I 


.03 


.78 1 


.96 


•96 


.97 






14. SD Maw 






, 1 












1 








1 




istered 


-.15 


.58 


.61 1 .33 


.31 


.26 ' 


.21 1 


.94 


-.19 




.33 


JO 


.27 


.23 1 




15. SG Owen's . 












1 1 














1 




Bayesian 


.44 


-;52 


-.45 1 M 


.39 


.36 


.44 1 


.40 






.38 


.40 


.38 


.38 i -.32 
1 




Coaventional teat 






^ — — 




















1 


\ 


16. Niper; 


























1 


1 


Correct 


•34 


-.07^ 


•03 I .85 


.85 


.82 


.80 j 


;16 


.61 




.84 


.85 


.82 


.79 1 .36 


.311 



-;4i 



J^nreUCibha > +.30 are aignifidbi^ at£<.00i; > t.23 are significant at£<.01; > f.l6 are significant at£<.05. 
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^ ^ For both the briginal^d revised item parameters, all stradaptive ability .es- 
timates correlated .96 or higher, ifean Difficulty Correct correlated .97 with 
Owen's Bayesiaa score in both cases and .96 with the Maximum Likelihood score; Sean 
pifficjilty Admnistered. correlated .96 with these two scores ih both cases, and 
Owen's^ Bay esian and Maximum Likelihood scores correlated 1;Q0 and ;97; TIaese re- 
sults show that the simple average difficulty scores ordered students almost iden- 
tically with the more complex latent-trait based* scores. 

The only obvious effect of revising the itm par^eter estimates was an the 
correlatiohs of the consistency scores with the sSiiity scores. Using the original 
item p^ameters, the SD Administered score correlated nonsignificantly with all - 
ability scores, and the SE Owen's Bayesian score correlated from .73 to .78 with 
ability scores. For these same variables^ using the revised itm p^^Kters, both - 
the_?D Administered ahd^SE OwenVs Bayesi^ scores correlated significantly with the 
ability scores, but correlations' ranged only from .23 to .40; The effect of the 
revised parameter estimates on these two consistency scores is seen in ^the correla- 
tion of .94 between origin^ and revised parameter estimates for the SD__ Adminis'- 

_ _ ran score was 

only .72. 




Revxsion of the i^^^^jarasietrer estiniates had do important effect oh the abili^ 
ty scores. ^^In^rxrainrelatib^ of .ability estimates using the two. sets of itCT pa^ 
t^^S^-^^strimates ranged from ^95 to 1.00; correlations computed betwera the s^e 

>ility score using the two sets of itOT'parameter estimates were .98 or 1.00. 
These correiatiotts were as high , as the intercorrelations of differeiit types of 
Bbxixtj estimated usin^ a commott sec of item parameters. 

Cbnvergeht validity of the stradaptive ability scores is indicated by their 
relatively hi^ correlations wi the co^entidnat test. These correlations, which 
were not affectKi by use of the different item_ parameter estimates, ranged from .79 
to .85, with a tendency for ttie_ non--latent--trait-l)a3ed- sco^ to correlate higher 
with conventional test. scores than did the scores using latent trait scoring 
methods. Cbrrelctibns of the cbnsistMcy scores wi^ convrationai test scores dif- 
fered for the two kinds b£ itCT: par^eter estimates. ' - 

B ayp s ia^^^dMSQgyi^tidnaL testg^ Prodxict-^mbment c of scores from 

the Bayesian: adaptive test and the convent ibhal test are shown in fable 4. Ni^er 
A d mini stered in tfie Bayesi^ test correlated hipest (rp.9G) with NuSer Correct in 
that test^ This resulted Srom: a. lack of highly discriminating items of high difdEi- 
culty^ in t±e B^^iffi'^itentjpooi, similar^ to the correlation of the same variables 
in the stradaptive test*. Therefore^ more items o£ low discrimihatioh were neces- 
sary to reacn Gie fixed, ppsteribxr variance termihatibn criterion for hi^ ability 
students, than for lb» ability sttzdents, for x^om more hi^ly discriminating items 

~ were available^ This is fur^er sixpported by. the correlation betx^en Nutaber Cor- 
rect and ^^e Bayesian ability estimate (^.89) and between the Bayesiah ability es- 
timate and ^tmber Admnistered (r^. 84) . A hi^ "'and signiificant cbrrelatibh (r*.85) 
was observed between, the Bayesian ^ility estiinate and the conventional test Number- 
Correct scbre^ indicating Uaat they were both measxiring the Scme trait i Bayesi^ 
test length f^ich^ because of its^igh correlation with the Bayesian ability e.s— 
timate, essentially measured ability level) correlated moderately (r^*.59) with Num.— 
ber Correct on the conventional test, tfeereas Number Correct on the two tests cbr^ 
related •72* The variance p£ the Bayesian ^ility/estimatej which was essentially 

. fixed for alt but die very hig^ ability testees (for whom there were not.stxffi- 
' . . ■ ■ ■ . ■ 
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ciently discriminating items available) ^ correlated essentially zero with all varx-- 
ables. 



Table 4 






Intercorrelatibns of Bayesian 


Adaptive 




and Conventional test Scores 






• Score 


Test and Score 12 


3 


4 



Bayesian Adaptive test 
1. Ntin^er Adtftitiistered 



2. Nosier C6rreT:t . 


.90 






3i Ability EsCxoate 


.84 


.89 




4^ Variance of 








Ability Estisnate 


' -.07 


-.08 


.18 


Convent ibnal Test 








Number Correct 


.59 


^72 


.85 



Sote^ Correlations >.28 significant at p^ <.bbir 
>.17, significant at p <.05. 

IntercbrrelatxQns_Q£_J^riteriQn Variabies^ 

_ table 5 shows the intercorrelations of the tiree GEA variables, and the five 
ACT scores fbir the tw grpops^ As eicp^ctedit the hi^est intercorrelations with 
each group were between the fotzr sxibscdres of the ACT and the ACT composite » 
HSH3PA. was Mst hi^ty correlated with Se ACT math score in Sroup 1 md with 
0M-06EA in Sroup 2, aniUlf-OCTA wai^ post highly correlated with the ACT composite 
score in botii groups. tJM-^MGPA coinrelated highest_with the ACT social science ^ 
scbrij aixd DH^-OGPA correlated highest with the ACT ccmpbsite score (r^. 43 and .51) 
for Group 1. Both UHr^GPA and tlB^^PA coCTelated: highest OTong the AfiT scores with 
the ACT composite 33 .535 fbr^rbup 2i the three i^?A measures appear to 
have provided different criterion information than the ACT score Sy ^ereas the ACT 
composite score, provided much of the same infbnutidn as_the four, ACT subscpres 
front whiidl it was. derivai (r^.78 to .89 in Group 1 and .74 to .86 in Group Z). 



Stradi iptlve-^gst» cottventrio^ . Table 6 showa the validity correlations for 
thei stradaptxye and conventxoxl^ testing strategies. For alX three GPA variables, ^ 
the best predictor was reported college SPA^ the str^teptive- entry point informa- 
tion. the predictiott of K-^A^ the conv^txonai test Nufiber Correct score cor— 
related .40 and. the stradaptive ability scores correlated from .41 to .45, with 
essentially no difference betroeu scores derived from the two sets of item parame- 
ter estimates > Using- both sets_ of item paraiaeter estimates^ Hean. Bifficulty Admin- 
ister^ achiev^ the hi^est '^lidi^ UH-^GPA,_Stimber Correct on 

. the conv^txott^.^test cotrelated .31.,: and the best of the adapcive scores (Mean 
Diffi^lty Admnxst^ &e revised parameters) correlated .32. Again, the 

— Wgan Bx ffi CT^ty Admin^tered score obtained the highest 'correlation among the 
itxtMa^^tx^^^ ^ tittli^dii^ c^^ by l^au Diffictaty Correct; the two 

. lahent^-ttfit-^ased scoringx^iethbds-^r^ayes and HaxiTmTTh Lit^lihodd^^estilted in 
lower vSliditieS^ V 



Table 5 

:^l^terc6irrel4tibh8 of Criterion Variables For Both Groups 



Ciriterxpii Variable 



BS 



CPA 



Criteirion Variable 



UM-0 



ACT Scbr6 



' Social Natural 
English Math Scietice Science 



&rqap: I 



5^ :8s-ePA 


56 












1 DM-MGPA 


77 


.46 












m 


.63 


.67 








V ;^Ct Score 
















55 


.57 


.39 


.44 






Matlv 

Sbc i Science 


.55 


.71 


;3l 


.49 


.58 




55 


.49 


.43 


.43 


.63 


.61 


natural Science 


55 


.43 


.22* 


.37 


.61 


.71 


Composite 


55 


.66 


.40 


.51 


.78 


.88 


Sroup 2 














HS-GPA 


71 












DM-M6PA 


: 104 


.46 












m 


.61 


.78 








ACT Scope 














. EngliiBli 


12 


;40 


;19 


;4l 








,71 


.55 


.27 


.37 


.40 




/ Social Science 


71- 


.46 


.20* 


.47 


.60 


.40 


: ■ Hatiirai Science 


71 


.46 


.31 


.41 


■■■ .47; 


.61 


eomposite 


n 


.58 


.33 


.53 


.74 


.77 



.64 
.83 



.63 
.82 



.89 



.86 



^'All correlations are statistrically different frbm 0.0 (|»^ <. 05) except those with an *. 
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Table 6 : 

Cbrrelatidhs o! Criteribh Variabiea wi th S cbrea fr oi Stradaptive and Conveatiisnaijests 

; Cfiteribh Variable 

- - - - - - - . _ _ ni_ 



GPA . . ACI Score 



Test ^nd Score 




\ii 111 






tiacn 


oOCUl 

Science 


Natural 


Com- 
posite 




















Entrv Paint (R^DOtte^i CPA) 














0044' 


;4o** 


' Nuinber Administered 


- 17 


02 


* HQ 












Hunker Cflrrfict 


- 16 
• iu 


- fil 










■^,2 J* 


-•JO** 
























^** 






•Hi** 


♦ JO*'' 




•59** 
















;)!** 


.58** 


l/Weu'e OoyeSlall - 




.10 






.39** 


.55** 


.54** 


,58** 


HaxinuiD LilceUhbod 


.41** 


.24* 




.57** 


.37** 


.54** 


.52** 


.56** 


SD Snistered 


.03 : 


.10 


. -.05 


-.19 


-.07 


-.06 


-.14 


-.13 


Owen's Bayeslan 


.36**: 


.28** 


,21* 


.53** 


.39** 


.52** 


.47** 


.54** 


Kevisedjten Plraneters, 


















Hean Difficulty idoinistered 


.44** 


.32** 


.27** 


.51** 


.40** 


.5P 


.52** 


.59** 


Bean Difficulty ebrrect 


'i43** 


;30** 


.25** 


.60** 


.38** 


.57** 


' .51** 


.58** 


ta's Bayesian 


.43** 


.24* 


J9* 


.62** 


.38** 


.57** 


.53** 


.58** 


Haxiitii tikelihddd 


.41** 


.25* 


J8* 


.58** 


.36** 


.52** 


.54** 


.56** 


SD idninistered 


.1^ 


.20* 




-.01 


.66 


.19 


-.01 


.04 


. S£ dtfeo's Hayesian : 


.24* 


.29**- 


.18* 


.30** 


,23* 


,.35** 


.27* 


,35** 


Cboventionai test 


















Nmober Correct 


.40**. 


.31** 




.62** 


.40** 


. .54** 


.52** 


.58**; 



i 



*Stati8ticaUy different from zero at £ < .05. 
**Statiatically different Iron zero at £ I iOl. 
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• . the taost striking differences in validity betxifeen thia adaptive and convention- 
aL test s -^^re obtained bn^t^ (fbr^ich the largest sample size 

was avail^lej-; Nt^er Sorrect on the conventional test correlated • 14 with 

UM-OGPA, which was not significantly diffetMt from zerb._ By contrast^ using the 
revised item parameters, the correlations of all ^tr adaptive stores were sipaifi- 
cantly different frbm zerb, ranging from r^A8 to .27. Hsiug the original para- 
ixxeters^ three b£ the fb^ stradaptive score correiations_were significantly differ- 
ent from zero, the exception being the Bayesian score. Thus ^ the best stradaptive 
scoring method (^feaa Difficulty Administered) accoiinted for 3. 7. times the ambtmt bf 
criterion variance than did the convent iohal test Number Cbrrect scbre; the second 
best stradaptive scoring method (Mean BifficTjdty Correct) accounted for 3. 22 more 
coBfflion variance. £t should also be recalled that the stradaptive test administered 
25% fewer items, on the average, than did Che convent ibnal test. Thus^ the higjier 

: validities were obtained despite shorter test lengths-. 

^ _ Correlations bf^ stradaptive^^^ contentions, test, scores with ACT scores wer^ 
siaailar to the correiations^f stradaptive and conventional scores-^ with HS-^A 
and BM-^^A. For all but ACT English, one or more of the stradaptive test scbres 
correlated higfier than did the convehtibnal test scbre: Fbr ACT English, Nu^er 
Correct on the cbnvehtibnal test correlated ^62, as did Owen's Bayesian score on 
the stradaptive., test with revised p^aSetCT estimates. ^ Thev, largest difference iii 
cbrrelations between the conventional test and the stradaptive test was with ACT 
social science; the conventional, test Cbrrect score cbrrelatibh bf .54 was • - 

exceeded by all but Haximim Likelihood scoring bf the stradaptive test^ wi^ corre^ 
latipns ranging- from .55 to • 58 S aimb case ^ere stradaptive score val^/ 

idxties exceeded ttose of convent test, hi^est cprrelatiptis were obtained \ 
with ^e Mean Difficulty^ Administered score • Lowst cbinrelatibns between stradap^ 
V tive scores and. ACT scores were generally obtained with the Haacimm tikelihbbd 
^coriny method. 

Results of si^ificance testis on the differences in the validity cbrrelations 

shown XXL table 6 indicated the fbllowing statistically significant differences::. 

1* Mean Difficulty Administered, using both briginal aid revised itCT para^ 
meters correlated sipificmtly^ i^ <.Q5) with DM-4iG^ than did _• 

either Owen:' s Bayesian score or the I&ximum titelihopd score. Number Cor^ 
re^ on the conventipnaL test correlated significantly hi^eir <r05) 
with this criteribxt variable than did the Bayesian scbre usiig the brigi- 
nal it«t_par^i»ters» 

2* - Heait Di^ficuJ^y- Cbrrec^^ usinybbth sets of item parameters, correla- 

ted signifxcantiy hi^er <. 05)^^^^ DM-MEPA than did th^ Bayesian 
score; but it was not significantly highesr tijm the Maximum^ L . 
score^ "TTsing the otdginaL itent paranieters^ the. M^txiimim lakelihbbd scbre 
correlated higher w£th DS-WSi than did the Bayesim score. 

3. asm Di£f ictaty- Aaminist^ea an^ Mem Difficulty Correct correlated higher 
C^<*91) with t^ Che Bayesian score, the ifexiniua Likelihood 
score, or the Nuniber Correct score on the conventional test* for both the 
original and revised parameters. ' ^ 

4. Mem Difficulty Administer^ correlated significantly' <^g5) higher^wich 
'ACT^sbcial^ science than did the Mayiimm Eikeiihood score using the revised 
itm par^etersi :~ ;n 

t6« data in Table 6 show that none bf the ability test scbres correlated ' 
:hi4fd.Y with. BM-pGPA; the hi^est cbrrelatibn was rf.27. Since D»-06PA was an awer^ 

25; : • -/.v , ■ ^ .'^^^ 



age across a wide variety of classesi frequently including substantial ncnverbal 
material i hjLgh cbrrelatibns with the vocabxilafy tests would not be "'escpected. To ' 
determine iAether^the vocabulary tests c^ in the typically observed range 

with a. r^levmt QfA variSie, the .effect of the mathematics ^rade an UK'-OGPk was ' 
eliminated by computing the partial correlations of ^ test scores with^J^'^GPAi thus 
partial ling but the effects bf DM-^GPA. These results are shown in Table 7. 

; Table 7\ - ■ 

Intercorrelations of UM-OGPA and UM-PGPA . V 



with Scbres from Stradaptive and 


Cbnventibnal 


Tests ^ 


^ Partial ling Out UH^GPA 




- ^ 


Criterion ^Variable 








Test • and Score 






Stradaptive Test 






Original Item Parameters 






Hean Difficulty Administered 


.27** 


.51** 


ifean Difficulty Correct 


.25 , 


: .49** 


Owen's Bayesian 


14 . 


.43** 


Maximum Likelihood 


.17 


.43** 


SP Administered 


-.05 • 


....10 


SK Owen's Bayesi^ 


■ .-.21 


.36** 


Revised Item' Parameters 






Mean Difficulty Administered 


.27** 


.50** 


feafr Difficulty Cbrregt ■ - 


.25** 


.49**- 


Owen's Bayesian ^ . 


.19* 


.44** 


♦Maxfimm Likelihood 


.IS* . 


' .45** 


SB Administered 


.04 


.19 


SE Owen's Bayesian 


.18* ■ 


.18 


Cdnventibnal Test 






Nusaber^Cbrrect 


.14 


.36** 



*Statisticaliy different from zero at p <.d5. 
**SfcatisticalLy different front zero at p <.bl. 



As^aBle 7 show ^ the partial correlations of all scores, with GPA were higher 
thmt: were the orxpnal. cbrrelatibng* M.'k abii±^ estimate scores \^e si^ificmt— 
^ con^^iated' with BM-?^A, using" both, original; and revised^ item parameters for the 

Stradaptive test* In^ddition, the correlation of Number Correct on the conven-* 

tiotxaL test with DM-*PGPA was also statistically different fe zerb. Cbrrelatibns 
of "the stradaptive scbr^is with DSFPGPA Were still substantially higher th^S Ntfiber 
Cb^ect on ' the_ cpror^^b^at^^ score (Mem Difficulty Cor- 

rect wis original item parameters) accounted for 26% of criterion variance, 
whereas Nimtfjer' Correct on the' conventignal:^ test accounted for only 132 of criterion 
variance-.' • ' - . 



Bayes ian jver jtts_ dbnventibnal > 8 presents validity cbrrelatibns for the 

Bayesian ^a4apti:ve and conventional: tests obtained &ois Group 2. On the average, 
the Bayesian ^ili^ estisiate. j^orr elated more highly with the external criteria 
than ndid Nuaflser Cott^pt, on the ^convent ionaL test. The Bayesian score correlated 
significantly higher (at ^ <.05) with HSrGPA than did ate conventional test score: 
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Table 8 

_CdrrilatioTis of Criterion Variables with Scores 
from Bayesian Adaptive Test eonveutiouai Test 



Bayesian Test Cqiiveri- 



• 

. _ ... 
Criterion Variable 


N 


Number 
Adnxn^ 
istered 


Noaber 
Correct 


Variance 
Ability of Ability 
Estinate Estimate 


cional 
Test 
Number 
Correct 


6Ei 














HS 


71 


.44** 


.46** 


.51** 


:09 


.40** 




104 


.23**- 


.20** 


.22** 


-. 10 


.16 


' USH3 


131 


'"•12 


.08 


-.16* 


.13 


.13 


ACT Score 












Eng^lish 


72 


.42** 


.41** 


.48** 


.12 


■ .50** 


Math 


71 


.28** 


.32** 


.34** 


.10 


.33** 


Social Sci^ce 


71 


.43** 


.48** 


$62-** 


.17 


.59** 


Nattiral Science 


71 • 


.40** 


.40** 


.50** 


.15 


■ .41**^ 


Composite 


71 


.49** 


.51** 


' .62** 


.16 


.57** 



*Statistically different &om zero at p < '.65.- 
**Statisticaii^ different from zero at p <^.bl. 



(^•51 versus ^.40). DHH*GPA; was also more accurately predicted by the Bayesian 
score (r^.ZZ) thaxt ^y the conventionat tesE score (xp.i6), but the difference was 
hot J tatistipaity significStt . So significant differences ( at £^ < . 05 ) were found. 
Between the validity coefficients for the Bayesim_ ability estimate and the cbhvei 
tional test Nuaier Correct score in pr^ictihg DHhDGPA and the five ACT scores* 
However^ with the exception of ^CT English^ the ability scores Stot the Bayesian 
adaptive test -correlated hi^er with the criterion variables than did the score on 
the conventional test. - 

" _ _ fable 9 
Correlations of OBHDSEi and BM-^CTA 
with Scores fcdi ^e Bayesi^ Adaptive 
and , Cbnventignal^fests , 
Partial ling Otit DEf^JCSPA 



5. 

Test: and Score. 


Criterion Vari^ie 


tJMHDGPA. 
(N-131) . 


. DM-PGPA 
(^100) 


Bayesian. Test 






: teitity- Estimate 


.16* 


.47* 


Conventional. Test . 






ITussber Correct 


.13 


.44** 



♦Statistically different Son zero at p <.05. 
**Statisticatiy different fron zero at p <.bl. 



• Correlati^oiw of the ^^siaa ability estimate and Nta^ score on the 

.cbtrventionaL Xest with UM-^CTA are shown in T^le 9. As was fbimd m the Group 1 
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raxgr^fg^tiallingnont^ of UM -t ^ S A r esulted iii highe r cdrre k ±it»s--af— 

be th test scores with the GPA variable Correiatibds for both test scores in- 
creased .31, and both partial correiatibds were significantly different from zero; 
Hpwever, there still were no significant differences between the validity correla- 
tions for the two tests; 

DISCUSSION AND CONetUSiSNS 

Testing Strategies. 

the niajor finding of this research was that the stradaptiv^ and Bayesxan adap- 
tive ^testing" strategies could predict to external crlterxon ffleasures as accurately^ 
and in some cases more accurately, as. could the conventional test. In achieving 
these equal or hi^r levels of validity^ the stradaptive test used approximately 
25% fewer items, on the average, than did the cbnventibhal test. The Bayesiau 
adaptive test used 20Z more items^ on the average, thm .the conventional test to 
achieye^the same validity, although theBnK^^an ntimber of items administered in the 
^Bayesian^test was 12.5% fewer than in the convent ional test. There were nb signi- 
ficant differences between the stradaptive and Bayesiah tests in terms of their 
correlations "with the external criteribn variables. The stradaptive test, using 
the Mein Difficulty Administefed^d Mem Difficulty Sorrect scores, predicted to 
overall cbllege GPA at a significantly higher level than did the cdnventibnal test. 

It may be argued that the differences in observed^ validities between the adap- 
tive and conventibnal tests are a ftmctipn of the hi^er item discriminations of 
items administered in the adaptive test and^ consequently, that a comparison be- 
tween th^ two testing strategies that does not equate for discrixoinations is unfair 
to the conventional test. ^Jhat this criticism, ignores ^ hbwever^ is that selecting 
items of hi^ discriminations from a large pbbl is one of the important advantages 
of adaptive testing and can not be denied to the procedure. 

A conventional test constructed to have discriminations equal to those items 

selected by the adaptive- test would have at a specific point bn the ability scale 
(1) good fidelity Md.ikjpr bandwidth if it were a peaked test or (2) good bandwidth 
and_ppbr fidelitjr if it had a rectangular distribution of item di faculties 
(^Bride^ 19765._Eith«" test woui poorly with a criterion variable if 

there were any range of individual differences, in the group being measured. Thus, 
the. adaptive test is designed to resolve this bandwidth-fidelity dil^ma by admin- 
istering to each individual a test of high fidelity (high itCT discriminations) at 
or neair the indiv^^ estiiated ability level (i.e., in a narrow bandwidth) with 

the Ibcatiott of the hi^ fidelity measuroceut adapted to each tes tee. 

This arguaent regarding hi^er levels bf validity for adaptive tests attribut- 
able to higher itCT disc riininatibtts alsb does: hot take, into accotmt the some^at 
different findihgs^btaihed with the over a^ 6I^& variable between the 

stradaptive and Bayesian adaptive tests. Both adaptive tests tend to select the 
most discriminating items in the pool that are closest to the Individual's ability 
level. Given that the average discriminations for the twb adaptive procedures were 
similaar, th^ si^iificant differences between thm in predicting overall college 6PA 
ixx relatibii to the. conventional test oSst have been due to their item selection 
procedures, their scorxng methods, or the interaction of these two test character- 
istics ; 
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The data in table 6 suggest that the differences in the validities of the 
adaptive test relative to overall college GPA mighthave Seen due to scoring 
methods- On the_ average, the twb^an difficulty scores used on the stradaptive 

correlations with all criterion variables. Th^se two 
scores, in comparis^on to the Bayesian and Maximum Likelihood scores ^ are relatively 
simple scores that do not use complex latent-trait^ased calculations. The simple 
ayerage^ difficulty sco^es_also do not utilize in their calculation the differing 
di sdri^nat ion? of it eM administered.^ The effect may be a score* that is less sam^ 
ple-speci£ic in that it is not optimized using explicit weights for both difficulty 
and discrimination. Similar to multiple-regressioii'^ei^ted^omposl^ such opti- 
mally weighted scores may^be sMple-specific ^^in this case^ highly dependent on the 
P^i^feicular pattern of itm^espbnses Md die specific values of the item parameter 
estimates) , resulting in lower correlations with complex external criterion vari-^ 
^les sudi as SPA* Another explanation may be that the latent-^rait itai discrim-^ 
inatioa paranieter is related to the first principal cOTponent of an itCT set; and 
its use in spring may result_in a "factor pxSe" score that would correlate lowerx 
with, m external criterion (^ich, like 65A, is' likely not to be factorially pure) 
.than woTxid a score that is factorially somewhat more complex. 

It inay also be argued that the higher vsilidities obtained for the adaptive 
test iising the overall college GPA criterion was partially the: result of the use of 
estimated GEA to begxn. testing- in the str^aptive test. This argument does not 
take into account, however, the- fact that the entry point info rmatixni is hot ex-* 
plicitly incoiT>drated_ into the stradaptive test mean difficulty scores; it serves 
dtily as a. means of selecting- the first it^ to be aSxnistered. After that item, 
all STibsequeht gtm selection is based on the pattern of responses given by the in-* 
dividual. Entry pomt infcrmation in the stradaptive test might have a minor ef— 
feet on the Mean Diffictxlty Administered score to the extent that the entry point 
is an accurate estiaiate of _ the ability being measured (Table 3 show that it cbrre^ 
lated .34 with cbuvehtiohal test scores and 5ot .26 to .46 with irfaptive test 
scores); but_it would have no direct effect on Mean Difficulty Correct scores, 
since tiiey are solely a function of ability levels Bi addition, this argunwnt 
would not explain the lower -Validity correlations for the Bayesian test as compared 
to the stradaptive test^ since the entry^ point (reported GPA) was eacplicitly in— 
eluded ill scoring- the Bayesiaxt test as a consequence of its use as a differential 
prior ability- estimate-. 5^ 

Data, in Table 3, Bhw that the simpler mean difficulty scores, however, con-* 
yeyed a l m ost the same infbrmatiott as the more complex latent trait scores; mean 
difficulty scores correlated .96 to .97 with Ba^si^ and J^xmom Likelihood 
scores, ^^e fiiper val mem difficulty scores for most criteria, in 

cottjunction with these high correlations, suggest that the mean difficulty scores 
from the stradaptive test may- be as good for practical purposes as more complex 
scoring methods. These results support those of _ Vale and Weiss (i975a, 197565 iSb, 
xisihg other criteria and cb^^isoM Jfem Difficulty Correct was a 

very useful scorxng Mtfiod for stradaptive tests. Further^ research would be desir- 
able to determine if the^e simpler scoring methods lai^t be useful in other adap- 
tive tests ^ 

^. ■ ' ' . . 

_ __The data.__in Table 3 also^shbw correlations b£ .97 and 1.00 between Bayesian 
j?j^J^ ^™!?L ^^^^99^_^^^^^^ estimates i Ihese correlations^ based on response 
records^ averaging about 30 itans, are slightly hi^er than the correlation of .95 ' ^ 
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obtained by Kingsbury and Weiss (1979) in their comparison of Bayesian and Saximtm 
Likelihood logistic scoring of achievCTtent test data using the: three-parameter — — 
model'. 

Item Parameter Estimates 



comparing the two sets of item parameter estimates used to score the 

stradaptive test by Bayesian and Maximim Idkelihpod_methods to motivated _by a 
desire to examine the generality of the finding by Prestwpqd and Weiss (1977) that 
^he paraxaeter estimation procedure suggested by Urry (1976), which corrected the 
biserial^cbrrelations for guessing i produced scores that were essentially linear 
transformations of the scores obtained by . using parameter estimates that did not. 
the data presented in fable 3 support the _earlier conclusion,. Correlations between 
ability estimates based on the two sets of itan parameter values were .98 for the 
two latOTt-trait scoring methods 6) also show no general 

differences in cb^elatibra of Bayesian^d Maximum Likelihood scores with the cri- 
terion vari^les whien the scores were obtained front the original and revised item 
parameter^estimates; there were, however^ sii^tly hi^er correlations with the two 
college GPA variables jiSien the hew parameters were used, with tte differences 
tending to be larger for the Bayesian score.: Sbne of the differences between val- 
idity correlations based on the two sets of item parameter estimates were,^owever, 
statistically significant.. The data, therefore^ support the cone lus ion that the 
two sets of item parameter estimates are essentially linear trans fbrmatibhs bf each 
other* since they performed essentially equivalently in this study and correlated 
highly in bbth the present study md Se Prestwood and ^iss (1977) study. 



.. Reported^ CPA ^ 

A minors: finding- frOT this study indicates that self Reports bf college SPA 
hav^a depee bf vfitiidity^ Data in Table 6 show that SEA reported in the intervals 
shown in Figure 1 correlated .59- with overall college SPA. as obtained from ixniver- 
sity records • These ^data suggest that * even when obtained under vbluhteer research 
conditions 1^ _ sclme cbhfidehce can be had in studeht-^repbrted SPAs. The data alsb 
s^w significant correlations b£ reported college SPA with ACT scores~correlations 
which in sbtM cases were not substantially different from those obtained from the 
verbal ability tests administered. 

Cbttclusions 

^ \- ' ' - ■ 

data show genaraliy higher, and in some cases significantly higher, cri- 
terion-related validities for the 'adaptive tests as ccrmpared to the. convent iohal 
tests. There is some st^estiott in the data that scoring of the ability test itCT 
responses by- the Bayesian Md Haxi^m |ikel^ latent-trait scoring methods may ' 
have reduced the validities of the adaptive test. In comparing the two adaptive 
testing procedures, tfie data suggest that the stradaptive test scored by mean dif-. • 
fictilty methods results in nbre valid ability estimates than the Bayesian adaptive 

test. 

^is study has been one of the first evaltiations of the criterion-related val- 
idity of adaptive testing strategies. Thus, these cpncltisions must be considered 
tentative xintil suppbrtai bjr Edition research. Characteristics bf the itCT 
pdbls I decisions m^de in. implraentatibn of the adaptive strate^es^ desi^ bf the 
coweatibttil^tMt, bf the saapte may all have affected the re— 

• suits • Tet the obtained findings are consistent with a wide range of related re-^ 
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search uaing diffe rent sakples, tests, and procedtaresi which shows important gains 
in neaau reme nt p r e c^^^-^ott--^d-accqr«^y--r«a^^ze<M>y— ehe-^ise-c 
to conventional, testing strategies; 
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;72 


1:07 3:00 1.46 


568 


.91 


^.08 1.63 


.29 


116 


.38 


-.38 


.49 


.33 


96 


1:14 


-1:88 l:13 


-1:72 


:10S 


:91 


•2:88 


:98 -2:63 


319 


.62 


U9 3.00 2.14 


266 


.87 


.16 2.12 


.51 


252 


.3i 


-.34 


.42 


.47 


125 


i.io 


•2.13 1.24 


-1.88 


12i- 


.88 


-2;54 


.96 -2.27 


652 


.60 


1.33 3.00 1.66 


329 


;87 


-.21 1.42 


.U 


54 


.30 


-.67 


.38 


.20 


129 


1.08 


-1.64 1.27 


M.35 


80 


.79 


-2.55 


.86 -2.25 


359 


;58 


1.54 3:00 2.07 


161 


.86 


-.25 1.38 


.13 


Hein 


.65 


-.66 


.79 


-.17 


22 


1.07 


-2.23 1,20 


-1.97 


198 


:74. 


•2:81 


.80 -2^50 


28B 


.56 


hil 3.00 1.26^ 


264 


.86 


.21 2.28 


.55 SD 


.28 


.19 


.24 


.39 


lOl 


1.02 


-1.67 1.17 


-1.40 


5 


.69 


-2,50 


.75 -2.16 


152 


.55 


1.40 3;00 1.6; 


315 


:83 


:17 1:85 


:S2 ScriCui 3 (36 Urn) 




44 


:99 


-1:71 1;15 


•1:41 


89 


.67 


-2.82 


.12 -2.49 


162 


,it 


1.17 3.00 1.21 


599 


.81 


-.231.63 


.16 


191 


1.40 ' 


■1.51 1.75 


-1.26 


158 


.98 -2.26 1.08 


-2,00 


184 


.67 


-2.54 


.73 -2.19 


140 


.52 


1.30 3.00 1.38 


Jo 


.18 


.30 1.92. 


.65 


194 


1.35- 


-1.23 1.79 


.96 


134 


.96 


-2.21 1.6? -1.94 


31 . 


.66 


-2.511 


.72 -2.14 


263 


.51 


1.38 3.00 1.47 


301 


.76 


.08 1.38 


.47 


36 


1.23 ' 


■1.08 1.64 


:79 


127 


.93 


-1.66 1.08 


■1.35 


63 


.64 


-2.51 


.69 -2.14 


378 


.49 


1.44 3.00 1.48 


56 


.75 


-.29 1.11 


44 


51 


1.16 • 


■1.33 1.43 ■ 


-1.04 


186 


.92 


-1.65 1.07 




106 


■ M 


-2.39 


.67 -2.01 


291 


.44 i.31 1.64 1.35 • 


60 


.66 


.24 1.23 


.64 


40 


1.02- 


-1.34 1.24 ' 


■1.03 


90 


.82 


-1.65 


.94' 


■1.31 


202 


.57 


-2.58 


.62 -2.17 


217 


.43 


1.25 1.25"' 1.38 


271" 


.53 


.33 .89 


.80 


• 87 


.99' 


-1.10 1.24 


.76 


66 


.80 


-2.32 


.67^.02 


131 


.36 


-3.80 


.60-2:58 


304 


.42 1.00 .8$ 1.34 


377 


.43 


-.23 .59 


;39 


199 


.92- 


-1.42 1,09 • 


-1.09 


83 


.77 


-1.80 


.88 


■1.45 


628 


.52 


-2.73 


.57-2.29 


m. 


.41 1,01 .83 1.37 


506 




■,09 .81 


.58 


.43 


.91 • 


•1.21 l.U • 


-.86 


559 


.62 


-1:80 


.62 


-1.68 


62 


.50 


-2,77 


.54-2.31 


668 


.39 


1>26 .93 1.49 


538 


.42 


.15 1.18 


.52 


109 


.89- 


-1.06 1.11 


-.70 


34 


.74' 


-1.93 


.83' 


■1.58 


93 


.48 


•2.68 


.5J:*-2.18 


168 


.37 4.36 .9; l.SS 


133 


.41 


-.09 .57 


.56 


103 


.89- 


•1.34 1.06 - 


■i.oo 


262 


.70 


-2.29 


.77-1.93 


643 


.44 


-2.56 


.49 -2.03 


155 


.36 i;24 _.7l 1.36: 


629 


.40 


-.26 '.SS' 


.42 


47 


.87 ■ 


-1.31 1.04 


-.96 


311 


.66 


-1.83 


.75-1.4'3 


81 


.41 


-2.95 


.44 -2i39 


.562 


.35 


1.60 -3.00. 1.22 


655 ^ 


.'.39',.. 


M .55 


.74 


239 


.77 ■ 


■1.10 


.94 


•^.71 


88 


.63' 


-1.75 


.71 ' 


■1.33 


Htio 1.28 


-2.67 1.36 -2,38 


107; 


.» 


lii v69 1.S9 


324 


.37 


.09 .52 


.77 


86 


.77- 


•1.55 


.89- 


•1.19 


232 


.59' 


-1.70 


.'67- 


•1.J5 


80 


.93 


.26 


.92 ,20 
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Table B 
/^tem Kscrimtiatibn (a) and 
Difficulty (b) Parameter Estiaiates for 
the Bayesian Adaptive Teat Item Fool. 



Itaa 
No. . i. 


b 


Icea 
So. j[. 


b. 


Irea 7— 

So. J. 




— Icra- 
Kb. 




Jz. 




IQQ 


.56 


-3.55 


95 


.51 


-2.20 


87 


.99 


-1.10 


302 . 


.51 


.37 


187 


.45 


-3.53 


76 


.56 


-2?19 


36 


1.23 


ti.bS 


666 


.55 


.42 


8: 


.93 


-3.42 


125 


i.io 


^.13 


293 


.56 


-1.07 


111 


.48 


.46 


t35 


.40 


-3.34 


276 


.41 


•^2.12 


85 


.76 


-1.07 


.375 


.49 


.46 


16 


.70 


-3.26 


214 


.42 


-2.08 


ib§ 


.89 


-1.06 


651 . 


.56 


.49. 


151 


.41 


-3.19 


196 


1.76 


-ii99 


lib 


.58 


-1.04 


164 


.41 


.62 


17 


.68 


-3.19 


34 


.74 


-1.93 


222 


.54 


-l.b2 


2i5 


.48 


.65 


121^ 


.70 


-3.11 


27 


1.23 


-1.92 


53 


.52 


-l.bl 


114 


.77 


.65 


131 


.56 


-2.98 


641 


.52 


-1.89 


123 


.67 


-1.00 


238 


.43 


.65 


81 


.41 


-2.95 


96 


1.14 


-1.88 


183 


.60 


-.94 


656 


.44 


.71 


65 


.96 


-2.94 


84 


1.43 


-1.87 


149 


.67- 


-•91 


337 


.98 


.73 


105 


.91 


-2.88 


. 311 


.66 


-1.83 


130 


.75 


-.85 


341 


.37 


.75 


124- 


l.bl 


-2.87 


141 


.42 


-1.83 


33 


.64 


-.85 


231 


.45 


.78 


181 


.94 


-2.83 


642 


.42 


-1.80 


203 


.65 


-.84 


294 


.70 


.79 


89 


.67 


-2.82 


83 


.77 


-1.80 


46 


.67 


-.81 


321 


.63 


.79 


198 


.74 


-2.81 


13 


1.54 


-1.78 


128 


.82 


-.75 


397 


.37 


.83 


11 


1.48 


-2,81 


88 


.63 


-1.75 


37 


.67 


-.69 


216 


.37 


.92 


99 


1.26 


-2.78 


108 


.47 


-1.71 


91 


.83 


-.59 


299 


.52 


.98 


SZ 


.50 


-2.77 


44 


.99 


-1.71 


154 


.66 


-.58 


304 


.42 


i.bb 


68 


.93 


-2.74 


232 


.59 


-1-70 


292 


.48 


-.58 


66b 


.41 


1.01 


628 


.52 


-2-73 
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1.46 


-1.68 


143 


.77 
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.72 


1.07 


42 


3.00 


-2.72 


101 


1.'02 


-1.67 


365 


.66 


-.5I 


288 


.56 


1.11 


. 28 


3.00 


-2.72 


127 


.93 


-1.66 


391 


.48 


-.53 


162 


.52 


1.17 


" 25 


3.00 


-2.72 


go 


.82 


-1.65 


270 


.86 


-.52 


217 


.43 


1.25 


93 


.48 


-2.68 


186 


.92 


-1.65 


188 


.71 ■ 


-.47 


140 


.52 


1.30 


-14 


1.79 


-2.67 


129 


1.08 


-1.64 


145 


.59 


-.41 


291 


.44 


1.31 


202 


.57 


-2.58 


227 


.71 


-1.63 


209 


.64 


• -.40 


652 


.60 


1.33 


643^ 


-.44 


-2.56 


189 


.66 


-1.60 


104 


.68 


-.40 


263 


.51 


1.38 


80 


.79 


-2.55 


94 


.49 


-1.57 


116 


.38 


-.38 


152 


.55 


1.40 


x34 


.67 


-2.54 


86 


.77 


-1.55 


318 


.40 


-.36 


378 


.49 


1.44 


126 


.88 


-2.54 


191 


1.40 


-1.51 


56 


.75 


-.29 


319 . 


.62 


1.49 


24- 


1.59 


-2.54 


640 


.67 


-1.47 


629 


.40 


-.28 


359 1 


.58 


1.54 


63 


.64- 


-2.51 


173 


.76- 


-1.43 


161 


.86 


-.25 


381 


.51 


1.79 


5 


.69 


-2.50 


199 


.92 


-i.42 


377 


.43 


-.23 


273 - 


.49 


1.79 


31 


.66 


-2.50 


285 


.71 


-1.42 


329 


.87 


-.21 


115 


.45 


1.88 


70 


1.16 


-2.47 


637 


.75 


-1.41 


272 


.98 


-.13 


672 


.85 


1.89 


9 


1.29 


-2.46 


' *3 


1.02 


-1.34 


133 


.41 


-.09 , 


662 


.57 


1.93 


102 


3.00 


-2.45 


103 


.89 


-1.34 


63b 


1.31 


-.05 


166 


.64 


2.03 


64- 


3.00 


-2.45 


51 


.1.16 


-1.33 


301 


.76 


.08 


336 


.49 


2.05 


206 


1.01 


-2.43 


47 


.87 


-1.31 


655 


.39 


.08 


180 


.43 


2.07 


71 


3.00 


-2.42 


671 


.52 


-1-31 


324 


.37 


.09 


274 


.42 


2,13 


7 


3.00 


-2.42 


112 


.52 


-1.30 


■347 


1.07 


.14' 


297 


.40 


2.3i 


106 


.62 


-2.39 


235 


.56 


-1.27 


283 


• .97 


.15 


328 


.54 


2.31 


66 


.80 


-2.32 


287 


.44 


-1.27 


266 


.87 


.16 


•385 


.42 


2.35 


262 


■ ;70 


-2.29 


194 


1.35. 


-1.23 


315 


.83 


.17 


309 


.48 


2-47 


158 


.98 


-2.26 


43 


.91 


-1.21 


264 


.86 


.2i 


298 


.43 


2.62 


22 


1.07 


-2.23 


117 


.52 


-1.19 


60 


.66 


.24 


627 


.42 


2.67 


138 


i-5t 


-2.22 


1-85 


.57 


-1.17 


340 


.78 


.30 


388 


.43 


2. 86 


649 


.44 


-2.21 


204 


.73 


-1.15 


271 


.53 


.33 


664 


.84 


2.95 


134. 


.96 


-2i2i 


239 


.77 


-t.ib 


296 


.91 


.34 


290 


.42 


3.38 




Table C . 

_Iteni Discriaiinatibn da) and Difficulty (b) 
Paraoetefs for the ItMS in the Conventional 
test, in Order of Admnistration 

item 



Reference No. 


a 




58 : 


.482 


-.957 


221 


.647 


-.740 


307 


.562 


-.836 


386 


.697 


.136 


2ii 


.609 


-. 720 


224 


.543 


-.785 


390 


.627 


-.731 


667 


.568 


-. 726 


156 


:647 


-.631 


208 


.582 


-.681 


234 


.512 


-.687 


52 


.606 


-.282 


137 


.400 


-. 739 


176 


.338 


-.897 


207 


.6dt , 


-.526 


218 


.332 


928 


; , 205 


.472 


-.618 


382. 


.638 


-.481 


342 


.774 


.172 


265 


.772 


.173 


-645 


.501 


- -.320 


. 661 


.579 


-.296 


670 


.620 


-.282 


327 


.571 


-.248 


50 


.505 


. -.234- 


14* 


• .627 


-.184^ 


369 


.562 


-.215 


233 


.468 


172 


139^ 


.417 


.189 


633 


.501 


-.078 


146 


.607 


.000 


295 


.474. 


-.035 


113 


.609- 


.247 


267 


.436 


' .188 


59 ' 


.637 


.173 


147 


.383 


1.152 


174 


.638 


1.156 


242. : 


.310 


.979 


306 


.490 


.969 


36T ; 


.377 


.978 




.543 


' -.188 


SD 


.112. 


.593 
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